Search CORE

8 research outputs found

Severity Classification of Parkinson's Disease from Speech using Single Frequency Filtering-based Features

Author: Alku Paavo
Kadiri Sudarsana Reddy
Kodali Manila
Publication venue
Publication date: 01/01/2023
Field of study

Developing objective methods for assessing the severity of Parkinson's disease (PD) is crucial for improving the diagnosis and treatment. This study proposes two sets of novel features derived from the single frequency filtering (SFF) method: (1) SFF cepstral coefficients (SFFCC) and (2) MFCCs from the SFF (MFCC-SFF) for the severity classification of PD. Prior studies have demonstrated that SFF offers greater spectro-temporal resolution compared to the short-time Fourier transform. The study uses the PC-GITA database, which includes speech of PD patients and healthy controls produced in three speaking tasks (vowels, sentences, text reading). Experiments using the SVM classifier revealed that the proposed features outperformed the conventional MFCCs in all three speaking tasks. The proposed SFFCC and MFCC-SFF features gave a relative improvement of 5.8% and 2.3% for the vowel task, 7.0% & 1.8% for the sentence task, and 2.4% and 1.1% for the read text task, in comparison to MFCC features.Comment: Accepted by INTERSPEECH 202

arXiv.org e-Print Archive

Aaltodoc Publication Archive

Wav2vec-based Detection and Severity Level Classification of Dysarthria from Speech

Author: Alku Paavo
Javanmardi Farhad
Kadiri Sudarsana Reddy
Kodali Manila
Tirronen Saska
Publication venue
Publication date: 25/09/2023
Field of study

Automatic detection and severity level classification of dysarthria directly from acoustic speech signals can be used as a tool in medical diagnosis. In this work, the pre-trained wav2vec 2.0 model is studied as a feature extractor to build detection and severity level classification systems for dysarthric speech. The experiments were carried out with the popularly used UA-speech database. In the detection experiments, the results revealed that the best performance was obtained using the embeddings from the first layer of the wav2vec model that yielded an absolute improvement of 1.23% in accuracy compared to the best performing baseline feature (spectrogram). In the studied severity level classification task, the results revealed that the embeddings from the final layer gave an absolute improvement of 10.62% in accuracy compared to the best baseline features (mel-frequency cepstral coefficients)

arXiv.org e-Print Archive

Automatic Classification of Vocal Intensity Category from Speech

Author: Kodali Manila
Publication venue
Publication date: 13/12/2021
Field of study

Vocal intensity regulation is a fundamental phenomenon in speech communication. In speech science, the term vocal intensity is referred to as the acoustic energy of speech, and it is quantified by sound pressure level (SPL). Unlike, for example, loudspeaker amplifies, which adjust the sound intensity by affecting only the gain, the regulation of intensity in speech is much more complex and challenging because it is based on the physiological speech production mechanism. The speech signal carries acoustical cues about the vocal intensity category/ SPL that the speaker used when the corresponding speech signal was produced. Due to the lack of proper calibration information in existing speech databases, it is not possible to estimate the true vocal intensity category/SPL used in recordings. In addition, there is only one previous study on the automatic classification of vocal intensity category. In this current study, a large speech database representing four vocal intensity categories (soft, normal, loud, and very loud) was recorded from 50 speakers by including calibration information. Two automatic machine learning-based classification systems were developed using Support Vector Machines (SVMs) and Convolutional Neural Networks (CNNs) and using Mel-Frequency Cepstral Coefficients (MFCCs) as features. The results show that the best classification accuracy (of about 65%) was obtained using the SVM classifier

Aaltodoc Publication Archive

Automatic classification of the severity level of Parkinson’s disease: A comparison of speaking tasks, features, and classifiers

Author: Alku Paavo
Kadiri Sudarsana
Kodali Manila
Publication venue: Academic Press
Publication date: 01/10/2023
Field of study

Automatic speech-based severity level classification of Parkinson’s disease (PD) enables objective assessment and earlier diagnosis. While many studies have been conducted on the binary classification task to distinguish speakers in PD from healthy controls (HCs), clearly fewer studies have addressed multi-class PD severity level classification problems. Furthermore, in studying the three main issues of speech-based classification systems—speaking tasks, features, and classifiers—previous investigations on the severity level classification have yielded inconclusive results due to the use of only a few, and sometimes just one, type of speaking task, feature, or classifier in each study. Hence, a systematic comparison is conducted in this study between different speaking tasks, features, and classifiers. Five speaking tasks (vowel task, sentence task, diadochokinetic (DDK) task, read text task, and monologue task), four features (phonation, articulation, prosody, and their fusion), and four classifier architectures (support vector machine (SVM), random forest (RF), multilayer perceptron (MLP), and AdaBoost) were compared. The classification task studied was a 3-class problem to classify PD severity level as healthy vs. mild vs. severe. Two MDS-UPDRS scales (MDS-UPDRS-III and MDS-UPDRS-S) were used for the ground truth severity level labels. The results showed that the use of the monologue task and the articulation and fusion of features improved classification accuracy significantly compared to the use of the other speaking tasks and features. The best classification systems resulted in a rate of accuracy of 58% (using the monologue task with the articulation features) for the MDS-UPDR-III scale and 56% (using the monologue task with fusion of features) for the MDS-UPDRS-S scale.Peer reviewe

Aaltodoc Publication Archive

Classification of vocal intensity category from speech using the wav2vec2 and whisper embeddings

Author: Alku Paavo
Kadiri Sudarsana
Kodali Manila
Publication venue: International Speech Communication Association
Publication date: 01/01/2023
Field of study

In speech communication, talkers regulate vocal intensity resulting in speech signals of different intensity categories (e.g., soft, loud). Intensity category carries important information about the speaker's health and emotions. However, many speech databases lack calibration information, and therefore sound pressure level cannot be measured from the recorded data. Machine learning, however, can be used in intensity category classification even though calibration information is not available. This study investigates pre-trained model embeddings (Wav2vec2 and Whisper) in classification of vocal intensity category (soft, normal, loud, and very loud) from speech signals expressed using arbitrary amplitude scales. We use a new database consisting of two speaking tasks (sentence and paragraph). Support vector machine is used as a classifier. Our results show that the pre-trained model embeddings outperformed three baseline features, providing improvements of up to 7%(absolute) in accuracy.Peer reviewe

Aaltodoc Publication Archive

Towards battery-less RF sensing

Author: Kodali Manila
Nguyen Le Ngu
Sigg Stephan
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 25/05/2021
Field of study

Funding Information: The authors appreciate partial funding from the Academy of Finland project ABACUS. Publisher Copyright: © 2021 IEEE.Recent work has demonstrated the use of the radio interface as a sensing modality for gestures, activities and situational perception. The field generally moves towards larger bandwidths, multiple antennas, and higher, mmWave frequency domains, which allow for the recognition of minute movements. We envision another set of applications for RF sensing: battery-less autonomous sensing devices. In this work, we investigate transceiver-less passive RF-sensors which are excited by the fluctuation of the received power over the wireless channel. In particular, we demonstrate the use of battery-less RF-sensing for applications of on-body gesture recognition integrated into smart garment, as well as the integration of such sensing capabilities into smart surfaces.Peer reviewe

Aaltodoc Publication Archive

Comparing 1-dimensional and 2-dimensional spectral feature representations in voice pathology detection using machine learning and deep learning classifiers

Author: Alku Paavo
Javanmardi Farhad
Kadiri Sudarsana
Kodali Manila
Publication venue: 'International Speech Communication Association'
Publication date: 01/09/2022
Field of study

This work was supported by the Academy of Finland (grant number 313390). The computational resources were provided by Aalto ScienceIT.The present study investigates the use of 1-dimensional (1-D) and 2-dimensional (2-D) spectral feature representations in voice pathology detection with several classical machine learning (ML) and recent deep learning (DL) classifiers. Four popularly used spectral feature representations (static mel-frequency cepstral coefficients (MFCCs), dynamic MFCCs, spectrogram and mel-spectrogram) are derived in both the 1-D and 2-D form from voice signals. Three widely used ML classifiers (support vector machine (SVM), random forest (RF) and Adaboost) and three DL classifiers (deep neural network (DNN), long short-term memory (LSTM) network, and convolutional neural network (CNN)) are used with the 1-D feature representations. In addition, CNN classifiers are built using the 2-D feature representations. The popularly used HUPA database is considered in the pathology detection experiments. Experimental results revealed that using the CNN classifier with the 2-D feature representations yielded better accuracy compared tousing the ML and DL classifiers with the 1-D feature representations. The best performance was achieved using the 2-D CNN classifier based on dynamic MFCCs that showed a detection accuracy of 81%.Peer reviewe

Aaltodoc Publication Archive

Motion pattern recognition in 4D point clouds

Author: Kodali Manila
Palipana Sameera
Salami Dariush
Sigg Stephan
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/09/2020
Field of study

We address an actively discussed problem in signal processing, recognizing patterns from spatial data in motion. In particular, we suggest a neural network architecture to recognize motion patterns from 4D point clouds. We demonstrate the feasibility of our approach with point cloud datasets of hand gestures. The architecture, PointGest, directly feeds on unprocessed timelines of point cloud data without any need for voxelization or projection. The model is resilient to noise in the input point cloud through abstraction to lower-density representations, especially for regions of high density. We evaluate the architecture on a benchmark dataset with ten gestures. PointGest achieves an accuracy of 98.8%, outperforming five state-of-the-art point cloud classification models.Peer reviewe

Aaltodoc Publication Archive